Skip to content

Conversation

@corwinjoy
Copy link
Contributor

Description

This PR adds encryption support and other advanced file options to delta-rs by implementing a comprehensive framework for file format settings. The changes enable users to configure encryption settings, customize writer properties, and apply file-level formatting options when reading and writing Delta tables.

  • Introduces a FileFormatOptions trait and related infrastructure to handle file-specific configurations
  • Adds support for both simple property-based encryption and KMS-based encryption through new factory patterns
  • Updates all operation builders to accept and propagate file format options throughout the write/read pipeline

In general, we have added a new trait called FileFormatOptions at the root DeltaTable level to unify how files within a delta table are read and written with specific formatting. The idea is that you can apply these settings once, at the top level, and then seamlessly perform any operations with the necessary settings.

This PR leverages the DataFusion TableOptions structure to support format options for multiple underlying file formats. (The idea being that delta-rs may eventually want to support storage formats beyond Parquet, such as Vortex or Lance.) Additionally, it centralizes file format options in a single, consistent location. This avoids the current difficulties where one has to separately set WriterProperties; then reader properties as part of the SessionState. (This is in line with comments from @roeap about how file configuration might be improved: #3300 (comment)). We would also like to eventually extend this upgrade to add notations about these file configurations to the delta table properties. For example, if the files are encrypted, one could add a KMS configuration for where to retrieve encryption keys.

Review Suggestion

This PR turned out to be larger than we hoped, so apologies for that, but I don't know how to split it into smaller pieces.
When reviewing, we suggest starting with the file crates/core/src/table/file_format_options.rs to get an overview of the new file format trait that can be applied to delta tables.

Related Issue(s)

Support Parquet Modular Encryption:
#3300

Documentation

Parquet Modular Encryption: https://docs.google.com/document/d/1MUg1J7u5VdLkgejJ4ybzfZt1OmwhQkq2iGPxsn4gqLI/edit?tab=t.0#heading=h.34wvmhc1zdch

Attribution

This PR was created in collaboration with @adamreeve

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Sep 29, 2025
@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@corwinjoy
Copy link
Contributor Author

Note that fully supporting Parquet encryption requires being able to get write and read properties per-file, which is why the existing ability to set WriterProperties isn't sufficient, and why WriterPropertiesFactory::create_writer_properties is called per file and requires a file path. This allows generating new random data encryption keys per file and performing tasks such as specifying a per-file AAD prefix or supporting the external storage of encryption keys that can be looked up using the file path.

@corwinjoy
Copy link
Contributor Author

@rtyler @roeap @alamb Tagging you here per our previous discussion on adding encryption support to delta-rs.

@corwinjoy corwinjoy changed the title feat: Add framework for File Format Options feat: add framework for File Format Options Sep 29, 2025
@rtyler rtyler self-assigned this Sep 30, 2025
@rtyler rtyler marked this pull request as draft September 30, 2025 13:18
@rtyler
Copy link
Member

rtyler commented Sep 30, 2025

I have marked this pull request as draft. This does not compile as is, I can come back to it once it is able to compile and pass unit tests

@corwinjoy
Copy link
Contributor Author

I have marked this pull request as draft. This does not compile as is, I can come back to it once it is able to compile and pass unit tests

@rtyler OK. It seems that when I auto-merged the main branch it introduced a build error. I have resolved this and the code is once again building and passing unit tests.

@corwinjoy corwinjoy marked this pull request as ready for review October 1, 2025 01:31
Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the benefit but we really need to reduce the surface of change that are being introduced

@roeap
Copy link
Collaborator

roeap commented Oct 1, 2025

@corwinjoy - awesome to see this come to fruition! Will find some time to give this a review hopefully tomorrow.

At first glance one quick question. Do we see a way to "bundle" the datafusion specific stuff a bit more? It's a bit hard to keep track of all the individual flags while reviewing :)

@corwinjoy
Copy link
Contributor Author

@roeap

At first glance one quick question. Do we see a way to "bundle" the datafusion specific stuff a bit more? It's a bit hard to keep track of all the individual flags while reviewing :)

What we did to minimize this dependency is define an abstract FileFormatOptions trait. Everything just passes around a FileFormatRef defined as Arc<dyn FileFormatOptions>. Then, only when needed, do we grab final table options or writer properties. Furthermore, we've gated these instances of getting final details behind three function calls in file_format_options.rs:

pub fn build_writer_properties_factory_ffo(
    file_format_options: Option<FileFormatRef>,
) -> Option<Arc<dyn WriterPropertiesFactory>> {...}

pub fn to_table_parquet_options_from_ffo(
    file_format_options: Option<&FileFormatRef>,
) -> Option<TableParquetOptions> {...}

pub fn state_with_file_format_options(
    state: SessionState,
    file_format_options: Option<&FileFormatRef>,
) -> DeltaResult<SessionState> {...}

There might be some ways to refine this further, but in general we've tried to isolate and abstract these file properties where possible and not require datafusion.

@corwinjoy
Copy link
Contributor Author

@roeap From a user point of view, we've tried hard to make the settings as easy as possible. This can be seen in crates/deltalake/examples/basic_operations_encryption.rs. Here, we demonstrate different kinds of operations on tables. (We have a more formal unit test at crates/core/tests/commands_with_encryption.rs). Thes code examples all look like ordinary operations; all we needed was a common function call when creating DeltaOps:

async fn ops_with_crypto(
    uri: &str,
    file_format_options: &FileFormatRef,
) -> Result<DeltaOps, DeltaTableError> {
    let prefix_uri = format!("file://{}", uri);
    let url = Url::parse(&*prefix_uri).unwrap();
    let ops = DeltaOps::try_from_uri(url).await?;
    Ok(ops.with_file_format_options(file_format_options.clone()))
}

Calling with_file_format_options is sufficient to apply the needed encryption settings for all operations.

# Conflicts:
#	crates/core/src/delta_datafusion/table_provider.rs
#	crates/core/src/operations/delete.rs
#	crates/core/src/operations/drop_constraints.rs
#	crates/core/src/operations/filesystem_check.rs
#	crates/core/src/operations/load.rs
#	crates/core/src/operations/merge/mod.rs
#	crates/core/src/operations/mod.rs
#	crates/core/src/operations/optimize.rs
#	crates/core/src/operations/restore.rs
#	crates/core/src/operations/update.rs
#	crates/core/src/operations/write/mod.rs
#	crates/core/tests/command_optimize.rs
#	crates/core/tests/integration_datafusion.rs
Signed-off-by: Corwin Joy <[email protected]>
# Conflicts:
#	crates/core/src/operations/optimize.rs
Signed-off-by: Corwin Joy <[email protected]>
Signed-off-by: Corwin Joy <[email protected]>
Signed-off-by: Corwin Joy <[email protected]>
# Conflicts:
#	Cargo.toml
#	crates/core/src/operations/delete.rs
#	crates/core/src/operations/optimize.rs
#	crates/core/src/operations/update.rs
#	crates/core/src/operations/write/execution.rs
#	crates/core/src/operations/write/writer.rs
#	crates/core/src/table/builder.rs
Signed-off-by: Corwin Joy <[email protected]>
@corwinjoy
Copy link
Contributor Author

@ion-elgreco OK. I have applied the changes you suggested from two weeks ago and re-merged the latest from main. So, I am ready for another review when you get the chance. Thanks again for the suggestions!


#[cfg(feature = "datafusion")]
#[derive(Clone, Debug, Default)]
pub struct SimpleFileFormatOptions {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose for this for other delta-rs users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleFileFormatOptions is a helper for users to easily create a FileFormatRef from TableOptions in order to set their read and write properties. We use it like:

let file_format_options = Arc::new(SimpleFileFormatOptions::new(tbl_options)) as FileFormatRef;

This is needed if users want to set their own TableOptions for reading and writing. In our examples, we only use this for encryption. But the idea is that users may set their own parquet options at the DeltaTable level. This default class is the easiest way to do it, holding TableOptions for reading and writing.

@corwinjoy corwinjoy force-pushed the file_format_options_squashed branch from 4b9b5d8 to e44dbf8 Compare November 5, 2025 22:28
@corwinjoy
Copy link
Contributor Author

@ion-elgreco OK. I have added changes to address your latest requests.
In addition, I have merged the latest main.
Plus, I have added the update from @jgiannuzzi to support async KMS for the tests and example.
Please let me know if you have any further questions or are ready for another review.

@ion-elgreco
Copy link
Collaborator

@roeap @roeap for me it looks in a mergable state, maybe you guys can give a final thought on it?

Perhaps we should put this behind an encryption feature gate

@corwinjoy
Copy link
Contributor Author

I had thought about doing this with an encryption feature gate, but left it out for this PR for a few reasons:

  1. Adding the gate makes both the code and usage more complex. Making it more of a hassle both to use and debug this feature.
  2. Relative to the size of the delta-rs project, disabling the encryption feature won't save much on size.
  3. This may be cleaner to add as a follow-up rather than mixed in with an already longish PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binding/rust Issues for the Rust crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants